Ranking spreaders by decomposing complex networks 
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Ranking the nodes' abihty for spreading in networks is a fundamental problem which relates 
to many real applications such as information and disease control. In the previous literatures, a 
network decomposition procedure called k-shell method has been shown to effectively identify the 
most influential spreaders. In this paper, we find that the k-shell method have some limitations 
when it is used to rank all the nodes in the network. We also find that these limitations are due 
to considering only the links between the remaining nodes (residual degree) while entirely ignoring 
all the links connecting to the removed nodes (exhausted degree) when decomposing the networks. 
Accordingly, we propose a mixed degree decomposition (MDD) procedure in which both the residual 
degree and the exhausted degree are considered. By simulating the epidemic process on the real 
networks, we show that the MDD method can outperform the k-shell and the degree methods in 
ranking spreaders. Finally, the influence of the network structure on the performance of the MDD 
method is discussed. 
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I. INTRODUCTION 

Spreading is an important process widely existing in 
various fields including physics, chemistry, medical sci- 
ence, biology and sociology For example, the re- 
action difi'usion processes [2[, pandemics cascading 
failures in electric power grids [1, Q and information 
dissemination Q can be naturally characterized by the 
framework of spreading. In particular, spreading in com- 
plex networks has been intensively studied in the past 
decade. Many studies have revealed that the spread- 
ing process is strongly influenced by the network topolo- 
gies [rl-fioj. With the understanding of spreading path- 
ways on networks, many methods have been developed 
to manipulate network structure to control the spreading 
threshold [HI, [HI. Moreover, in order to avoid the wide 
propagation of the disease, various efficient immunization 
strategies were also proposed [H, [ij] . 

Though lots of former works are dedicated to under- 
stand and control the spreading process in a macroscopic 
sense, recently more and more attentions have been paid 
to microscopically study the spreadability for each node, 
i.e., how many nodes will finally be covered when the 
spreading originates from this single node [TB-flTj. The 
knowledge of node spreadability is crucial for develop- 
ing efficient methods to either decelerate spreading in the 
case of diseases, or speed up spreading in the case of infor- 
mation flow. Moreover, it can be helpful for identi fyin g 
the initial spreader of certain disease or information [l8| . 
Though the most connected nodes (hubs) and the nodes 
with high betweenness centrality are commonly believed 
to be the most influential spreaders in networks, the k- 
shell (also called k-core) method is found to perform bet- 
ter in identifying the best individual spreaders [H, [l^ . 
The k-shell method starts by removing all nodes with 
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one connection only (with their links) , until no more such 
nodes remain, and assign them to the 1-shell. For each 
remaining node, the number of links connecting to the 
other remaining nodes is called its residual degree and 
the number of links connecting to the removed nodes is 
called its exhausted degree. After assigning the 1-shell, 
all nodes with residual degree 2 are recursively removed 
and the 2-shell are created. This procedure continues as 
the residual degree increases until all nodes in the nodes 
have been assigned to one of the shells. The nodes with 
high k-shell value tend to locate in the center of the net- 
work and the spreading starting from each of these nodes 
arc likely to widely cover the network. Actually, similar 
idea has also been applied to assign direction to the link 
of undirected networks and signiflcant improvement in 
synchronizability can be achieved [10, [2l| . 

We find, however, that the k-shell method has sev- 
eral limitations when it is used to rank the spreadability 
for all the nodes. First, it assigns many nodes with the 
same rank even though they perform entirely different 
in spreading. The extreme examples are the tree net- 
work [2^ and Barabasi- Albert network in which the 
k-shell method assigns every node with the same shell. 
Second, the assigned shell by this method cannot cor- 
rectly reflect the real spreadability of nodes in some cases. 
For instance, if a hub i connects a large number of tree- 
like branches, the k-shell method will still assign the hub 
with kg = 1. However, if a node j with low degree forms 
only one triangle with other nodes, it will have kg = 2. 
Apparently, node i should performs far better than j as a 
spreader in reality. These two limitations make kg unable 
to be used to accurately rank the spreadability of nodes. 

Actually, the above-mentioned limitations for k-shell 
method is due to entirely ignoring all the links of the 
removed nodes when decomposing the networks. In this 
paper, we propose the so-called mixed degree decomposi- 
tion (MDD) procedure in which both the residual degree 
and the exhausted degree are taken into account. By 
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simulating the epidemic process on the real networks, we 
show that the MDD performs more accurately than the 
k-shell and the degree methods in ranking the spread- 
ability for nodes. Finally, we discuss how the structure 
of real network affects the performs of the MDD method. 



II. METHOD 
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The k-shell method is a dynamical network decompo- 
sition procedure in which the residual degree of nodes 
should be updated in each step. During the decomposi- 
tion all the information of the removed nodes are dropped 
so that this method assumes that the remaining nodes 
are homogeneously connecting to the removed nodes. In 
other words, if the virus or information reaches a certain 
layer, each node in this layer is assumed to spread the 
virus/information to the same number of nodes in the 
lower layers, which is not true in reality. If a node in a 
low layer connects to a big branch of removed nodes, not 
only this node should be ranked higher than the other 
nodes in this layer, but also it may have stronger spread- 
ability than some nodes in the higher layers. 

The analysis above requires us to take the information 
of removed nodes into consideration during the decom- 
position procedure. For a node i, we denote the resid- 
ual degree (number of links connecting to the remaining 
nodes) and the exhausted degree (number of links con- 
necting to the removed nodes) as and kf respectively. 
To achieve a more accurate ranking for node spreadabil- 
ity, we propose a Mixed Degree Decomposition (MDD) 
procedure in which the nodes are removed in each step 
according to the mixed degree 
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where A is a tunable parameter between and 1. The 
detailed decomposition is done with the following proce- 
dure: 

1. Initially, fc™ of each node is equal to k"^ since there 
is no removed node in the network. 

2. Remove all the nodes with the smallest fc™ (de- 
noted as M) and assign them to the M-shell. 

3. Update fc™ of all the remaining nodes by fc™ = 
k"^ + X* kf. Then, remove all the nodes with fc™ 
smaller than or equal to M and assign them to 
the M-shell too. This step is recursively carried on 
until fc™ of all remaining nodes are larger than M . 

4. Repeat step 2 and 3 as A/ value increases until all 
nodes in the network have been assigned to one of 
the shells. 

Apparently, when A = 0, the MDD method returns to 
the k-sheU method in ref. When A = 1, the MDD 

method is equivalent to the degree centrality. Different 



FIG. 1. (Color online) A simple example to illustrate the 
procedure of the Mixed Degree Decomposition (MDD). The 
nodes and links with dashed line represent respectively the 
removed nodes and exhausted links. Here, the parameter A 
in MDD is set as 0.7. 



from the original k-shell method, note that the shell val- 
ues in MDD method are no longer integer since fc™ can 
be decimal when A is between and 1. To better illus- 
trate the procedure of MDD, a simple example is shown 
in Fig. 1 in which parameter A for the MDD method is 
set as 0.7. 



III. RESULT 

To validate the effectiveness of the MDD method, 
we then apply it to real networks which include social 
and nonsocial networks. Social networks are: Dolphins 
(friendship) [23| , Jazz (musical collaboration) 
(collaboration network of network scientists) 
(communication) (23], HEP (collaboration network of 
high-energy phvsicist s) ( 28| . PGP (an encrypted com- 



25 1, Netsci 
2e|, Email 



munication network) [29[ , Astro phys (collaboration net- 
work of astrophysics scientists ) 128|, Cond matt (col- 
laboration network of condensed matter scientists) [28| . 
Nonsocial networks are: Word (adjacenc y r elation in 
English text) 26i, E. coli (metabolic) [3^, C. ele- 
gans (neural) 3l|, TAP (yeast protein-protein bind- 
ing network gene rated by tandem affinity purification 
experiments) [34I, Y2H (yeast protein-protein binding 
network generated using yeast two hybridization) (33j . 
Power (connections between power stations) [3^ . Inter- 
net (router level) [s^. To better illustrate the perfor- 
mance of the MDD method, we select four relatively large 
networks (email, PGP, Astro phys and Cond matt) as ex- 
amples and show their results by figures throughout the 
paper. The results of other networks are detailedly re- 
ported in Table I. 

As we mentioned above, one of the limitation of the 
original k-shell method is that it is a coarse grained 
method which assigns many nodes with the same shell 
(which is equivalent to assigning them with the same rank 
in spreadability) . In Fig. 2, we show the frequency of dif- 
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FIG. 2. (Color online) The frequency of different ranks in 
k-shell method, degree centrality method and MDD method 
(A = 0.7). The networks are: (a) email, (b) PGP, (c) astro 
phys and (d) condensed matter. 



FIG. 3. (Color online) The value of r of the k-shell, degree 
centrality and MDD methods under different infection rate 
p in the SIR model. The networks are: (a) email, (b) PGP, 
(c) astro phys and (d) condensed matter. The results are 
averaged over 100 independent realizations. 



ferent ranks in k-shell method, degree centrality method 
and MDD method. Obviously, the k-shell only has lim- 
ited number of ranks and the frequency of each rank is 
quite high, which implies that node differences are not 
well distinguished in the k-shell method. By using the de- 
gree centrality to rank the nodes, larger number of ranks 
will be obtained. In the MDD method, nodes are more 
detailedly ranked than the previous two methods and 
the number of ranks can be even ten times larger than 
the degree method. More importantly, frequency of the 
top-rank is almost 1 which suggests that these nodes are 
well separated. We also check the performance of MDD 
method on the tree network and BA model in which k- 
shell method is not valid, the results show that the MDD 
can easily detect the difference between nodes. 

All the ranking generated by k-shell, degree and MDD 
methods are obtained by analyzing network topology. In 
principle, an effective topology-based ranking should be 
as close as possible to ranking by the real spreadingcov- 
erage. In this paper, we employ the SIR model [l[ to 
simulate the spreading process on networks. The number 
of final infections resulting from a given initially-infected 
node i is denoted as its spreadability where p is the 
infection rate in the SIR model. For all the methods 
mentioned above, a final ranking will be generated. We 
therefore use the Kendall's tau rank correlation coeffi- 
cient (r) to estimate how the a certain topology-based 
ranking is correlated to the ranking by the true spread- 
ability s of the nodes. In the most ideal case where r = 1, 
for each two nodes i and j, if i is ranked before j by the 
topology-based method, i will have stronger spreadabil- 
ity than j. In Fig. 3, we show the value of r of the k-shell, 
degree centrality and MDD methods under different p. In 



this paper, we use relatively small values for p, namely 
p e (0, 0.5], so that the infected percentage of the nodes is 
not so large. In the case of large p values, where spreading 
can cover almost all the network, the role of individual 
nodes is no longer important since the final coverage of 
virus is independent of where it originated from. Interest- 
ingly, though the k-shell method is claimed to be able to 
identify the most influential node, its t value is not signif- 
icantly higher than that of the degree centrality method. 
Due to the two limitations pointed out in the introduc- 
tion section, the k-shell method cannot effectively reflect 
the spreadability of those nodes with low rankings. We 
again set A = 0.7 in the MDD as an example and show 
its r value in Fig. 3. As we can see, the MDD outper- 
forms both the k-shell method and the degree centrality 
method under all the p value we considered. 

In order to systematically study how the parameter A 
affects the performance of the MDD method, we calcu- 
late a (t) by summing all the r under different infection 

rate p, namely (r) = J2p=o '''(p)- ^^^^ way, we can in- 
vestigate under which A the MDD can achieve the largest 
(r) , which means the MDD ranking can most accurately 
reflect the general spreadability of nodes. The related 
results are shown in Fig. 4. Clearly, neither the k-shell 
method nor the degree centrality method performs good 
enough. However, by increasing (decreasing) a little bit 
of A when A = (A = 1), the (r) can be significantly im- 
proved. Moreover, for each network, there is an optimal 
A* under which the MDD method can achieve an largest 
(r). The results for other real networks are reported in 
Table I. The results show that the optimal A* universally 
exists in both social and nonsocial networks. 



4 



TABLE I. Structure properties and ranking results of the different real networks. Structure properties include network size 
(TV), edge number (E), degree Heterogeneity {H — {k'^) / {k)^), degree assortativity (r), clustering coefficient ((C)) and average 
shortest path length ((d)). 



Network 


iV 


E 


H 


r 


(C) 


(d) 


{T)ks 






A* 


Dolphins 


62 


159 


1.327 


-0.044 


0.259 


3.357 


0.563 


0.710 


0.751 


0.62 


Word 


112 


425 


1.815 


-0.129 


0.173 


2.536 


0.713 


0.803 


0.816 


0.74 


Jazz 


198 


2742 


1.395 


0.020 


0.618 


2.235 


0.484 


0.526 


0.530 


0.68 


E. coh 


230 


695 


2.365 


-0.015 


0.224 


3.784 


0.702 


0.683 


0.721 


0.27 


C. elegans 


297 


2148 


1.801 


-0.163 


0.292 


2.455 


0.614 


0.693 


0.701 


0.66 


Netsci 


379 


914 


1.663 


-0.082 


0.741 


6.042 


0.453 


0.509 


0.532 


0.59 


Email 


1133 


5451 


1.942 


0.078 


0.220 


3.606 


0.766 


0.793 


0.809 


0.47 


TAP 


1373 


6833 


1.644 


0.579 


0.529 


5.224 


0.619 


0.673 


0.688 


0.72 


Y2H 


1458 


1948 


2.667 


-0.210 


0.071 


6.812 


0.407 


0.428 


0.462 


0.30 


Power 


4941 


6594 


1.450 


0.004 


0.080 


18.989 


0.348 


0.506 


0.536 


0.55 


HEP 


5835 


13815 


1.926 


0.185 


0.506 


7.026 


0.535 


0.537 


0.581 


0.38 


PGP 


10680 


24316 


4.147 


0.238 


0.266 


7.463 


0.457 


0.453 


0.480 


0.24 


Astro phys 


14845 


119652 


2.820 


0.228 


0.670 


4.847 


0.731 


0.736 


0.753 


0.50 


Internet 


22963 


48436 


61.978 


-0.198 


0.230 


3.850 


0.554 


0.546 


0.565 


0.15 


Cond matt 


36458 


171736 


2.960 


0.177 


0.657 


5.476 


0.702 


0.713 


0.743 


0.42 




FIG. 4. (Color online) The value of (r) of the MDD methods 
under different parameter A. The networks are: (a) email, (b) 
PGP, (c) astro phys and (d) condensed matter. The results 
are averaged over 100 independent realizations. 



We further move to investigate how the network struc- 
ture influences the performance of the MDD method. 
We first calculate the relative improvement in (r) as 
('^>MOD-("^)fc3 ^ rp-j^^ relative improvement can range from 

5% to 60% in different networks (the absolute improve- 
ment of T in some networks can reach as large as 0.188). 
Here, we are interested in how the relative improvement 
get affected by the network topology (mainly including 
degree heterogeneity, assortativity, average shortest path 
length and cluster coefficient). The results are shown as 
scatter plots in Fig. 5 in which each point represents a 



FIG. 5. (Color online) The relation between the relative im- 
provement in r and the network topology parameters includ- 
ing (a) degree heterogeneity, (b) assortativity, (c) cluster co- 
efficient and (d) average shortest path length. The inserts are 
the relation between the optimal A* and the network topology 
parameters. Each point in this figure is corresponding to a 
real network in Table I. 

real network. Obviously, the relative improvement and 
degree heterogeneity exhibit a negative correlation. This 
is quite straight forward because the k-shell method is 
more likely to assign lots of nodes with the same shell 
when the degree is homogeneous while the MDD method 
can better distinguish those nodes by considering the ex- 
hausted degree. From Fig. 5(b), we can see that the 
large relative improvement tends to exist in negative as- 
sortative region. In the networks with negative assorta- 
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tivity, large degree nodes incline to connect to low de- 
gree nodes, in which case some hub connecting many 
tree-like branches might be formed. Therefore, the MDD 
method can outperform k-shell method by ranking these 
hubs higher. As we discussed in the introduction section, 
k-shell method does not perform well in low clustered 
networks and it entirely fails in tree networks and BA 
models. Accordingly, the improvement of MDD method 
is larger in the networks with low cluster coefficient as 
shown in Fig. 5(c). Moreover, Fig. 5(d) shows the rela- 
tion between the relative improvement and the average 
shortest path length of the networks. The result suggests 
that MDD method tends to be more effective in networks 
with large diameter. Since the real networks (especially 
online social networks) in modern society are extremely 
large, the MDD will be a suitable method for application. 



We recall the results in Fig. 4 in which by increasing 
(decreasing) a little bit of A when A = (A = 1) the 
(r) can be significantly improved. This feature suggests 
that the MDD method is very robust since the MDD 
can outperform k-shcU method and degree method in a 
large range of A. However, we still try to understand 
better the relation between the optimal A* and the net- 
work topology parameter, the results are reported in the 
insert in Fig. 5. Though the correlation is not clear, we 
can see some rough trend. When the network are with 
heterogeneous degree, negative assortativity, low cluster 
coefficient and large shortest path length, a large A gen- 
erally performs better. The reason of this trend can be 
more or less explained based on the above analysis on the 
relative improvement in (r). 



IV. CONCLUSION 

The well-known k-shell method is able to identify the 
most influential spreaders in networks. However, it has 
two limitations when it is used to rank the nodes spread- 
ability which is defined as the number of final infections 
resulting from each single given initially-infected node. 
Specifically, it assigns many nodes with the same rank 
and it may give a node with strong spreadability with 
low rank in some cases. We find these limitations are 
actually due to entirely ignoring the information of ex- 
hausted links (i.e., links connecting the remaining nodes 
to the removed nodes). Accordingly, wc propose a Mixed 
Degree Decomposition (MDD) procedure with a tunable 
parameter A to rank the spreadability for nodes in net- 
works. By partially considering the exhausted links, we 
show that the MDD method can significantly improve 
the ranking accuracy for spreadability and there is an 
optimal A for each networks. Moreover, the infiuence of 
the network structure on the performance of the MDD 
method is investigated in detail. 

Finally, though the MDD method can largely improve 
the k-shell method in ranking spreaders in complex net- 
works, it is not the optimal way to address this prob- 
lem. For example, directly considering the number of 
possible spread pathes and weighting them with some 
proper damping factor might obtain a more accurate 
spreadability ranking. However, this method can be with 
much higher computational complexity than the network 
decomposition-based methods. Therefore, some more ef- 
fective and efficient methods are still asked for further 
investigation. 
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